Search CORE

11 research outputs found

Syn-QG: Syntactic and Shallow Semantic Rules for Question Generation

Author: Dhole Kaustubh D.
Manning Christopher D.
Publication venue
Publication date: 09/06/2021
Field of study

Question Generation (QG) is fundamentally a simple syntactic transformation; however, many aspects of semantics influence what questions are good to form. We implement this observation by developing Syn-QG, a set of transparent syntactic rules leveraging universal dependencies, shallow semantic parsing, lexical resources, and custom rules which transform declarative sentences into question-answer pairs. We utilize PropBank argument descriptions and VerbNet state predicates to incorporate shallow semantic content, which helps generate questions of a descriptive nature and produce inferential and semantically richer questions than existing systems. In order to improve syntactic fluency and eliminate grammatically incorrect questions, we employ back-translation over the output of these syntactic rules. A set of crowd-sourced evaluations shows that our system can generate a larger number of highly grammatical and relevant questions than previous QG systems and that back-translation drastically improves grammaticality at a slight cost of generating irrelevant questions.Comment: Some of the results in the paper were incorrec

arXiv.org e-Print Archive

CANDLE: Decomposing Conditional and Conjunctive Queries for Task-Oriented Dialogue Systems

Author: Dhole Kaustubh D.
Gupta Aadesh
Prabhakar Swetha
Shrivastava Ashish
Tarway Rahul
Publication venue
Publication date: 08/07/2021
Field of study

Domain-specific dialogue systems generally determine user intents by relying on sentence-level classifiers which mainly focus on single action sentences. Such classifiers are not designed to effectively handle complex queries composed of conditional and sequential clauses that represent multiple actions. We attempt to decompose such queries into smaller single-action sub-queries that are reasonable for intent classifiers to understand in a dialogue pipeline. We release CANDLE (Conditional & AND type Expressions), a dataset consisting of 3124 utterances manually tagged with conditional and sequential labels and demonstrates this decomposition by training two baseline taggers

arXiv.org e-Print Archive

Automatic Construction of Evaluation Suites for Natural Language Generation Datasets

Author: Dhole Kaustubh D.
Gangal Varun
Gehrmann Sebastian
Kale Mihir
Mahamood Saad
Mille Simon
Perez-Beltrachini Laura
van Miltenburg Emiel
Publication venue
Publication date: 01/01/2021
Field of study

Machine learning approaches applied to NLP are often evaluated by summarizing their performance in a single number, for example accuracy. Since most test sets are constructed as an i.i.d. sample from the overall data, this approach overly simplifies the complexity of language and encourages overfitting to the head of the data distribution. As such, rare language phenomena or text about underrepresented groups are not equally included in the evaluation. To encourage more in-depth model analyses, researchers have proposed the use of multiple test sets, also called challenge sets, that assess specific capabilities of a model. In this paper, we develop a framework based on this idea which is able to generate controlled perturbations and identify subsets in text-to-scalar, text-to-text, or data-to-text settings. By applying this framework to the GEM generation benchmark, we propose an evaluation suite made of 80 challenge sets, demonstrate the kinds of analyses that it enables and shed light onto the limits of current generation models

arXiv.org e-Print Archive

Tilburg University Repository

NusaCrowd: Open Source Initiative for Indonesian NLP Resources

We present NusaCrowd, a collaborative initiative to collect and unify existing resources for Indonesian languages, including opening access to previously non-public resources. Through this initiative, we have brought together 137 datasets and 118 standardized data loaders. The quality of the datasets has been assessed manually and automatically, and their value is demonstrated through multiple experiments. NusaCrowd's data collection enables the creation of the first zero-shot benchmarks for natural language understanding and generation in Indonesian and the local languages of Indonesia. Furthermore, NusaCrowd brings the creation of the first multilingual automatic speech recognition benchmark in Indonesian and the local languages of Indonesia. Our work strives to advance natural language processing (NLP) research for languages that are under-represented despite being widely spoken

arXiv.org e-Print Archive

A Bird's-Eye Tutorial of Graph Attention Architectures

Author: Dhole Kaustubh D.
Yang Carl
Publication venue
Publication date: 06/06/2022
Field of study

Graph Neural Networks (GNNs) have shown tremendous strides in performance for graph-structured problems especially in the domains of natural language processing, computer vision and recommender systems. Inspired by the success of the transformer architecture, there has been an ever-growing body of work on attention variants of GNNs attempting to advance the state of the art in many of these problems. Incorporating "attention" into graph mining has been viewed as a way to overcome the noisiness, heterogenity and complexity associated with graph-structured data as well as to encode soft-inductive bias. It is hence crucial and advantageous to study these variants from a bird's-eye view to assess their strengths and weaknesses. We provide a systematic and focused tutorial centered around attention based GNNs in a hope to benefit researchers dealing with graph-structured problems. Our tutorial looks at GNN variants from the point of view of the attention function and iteratively builds the reader's understanding of different graph attention variants.Comment: 8 pages Tutoria

arXiv.org e-Print Archive

The GEM Benchmark:Natural Language Generation, its Evaluation and Metrics

Author: Adewumi Tosin
Aggarwal Karmanya
Ammanamanchi Pawan Sasanka
Anuoluwapo Aremu
Bosselut Antoine
Cabezudo Marco Antonio Sobrevilla
Chandu Khyathi Raghavi
Clinciu Miruna
Das Dipanjan
Dhole Kaustubh D.
Du Wanyu
Durmus Esin
Dušek Ondřej
Emezue Chris
Gangal Varun
Garbacea Cristina
Gehrmann Sebastian
Hashimoto Tatsunori
Hou Yufang
Jernite Yacine
Jhamtani Harsh
Ji Yangfeng
Jolly Shailza
Kale Mihir
Kumar Dhruv
Ladhak Faisal
Madaan Aman
Maddela Mounica
Mahajan Khyati
Mahamood Saad
Majumder Bodhisattwa Prasad
Martins Pedro Henrique
McMillan-Major Angelina
Mille Simon
Nadeem Moin
Narayan Shashi
Nikolaev Vitaly
Niyongabo Rubungo Andre
Osei Salomey
Parikh Ankur
Perez-Beltrachini Laura
Rao Niranjan Ramesh
Raunak Vikas
Rodriguez Juan Diego
Santhanam Sashank
Sedoc João
Sellam Thibault
Shaikh Samira
Shimorina Anastasia
Strobelt Hendrik
Subramani Nishant
van Miltenburg Emiel
Xu Wei
Yang Diyi
Yerukola Akhila
Zhou Jiawei
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/04/2021
Field of study

We introduce GEM, a living benchmark for natural language Generation (NLG), its Evaluation, and Metrics. Measuring progress in NLG relies on a constantly evolving ecosystem of automated metrics, datasets, and human evaluation standards. Due to this moving target, new models often still evaluate on divergent anglo-centric corpora with well-established, but flawed, metrics. This disconnect makes it challenging to identify the limitations of current models and opportunities for progress. Addressing this limitation, GEM provides an environment in which models can easily be applied to a wide set of tasks and in which evaluation strategies can be tested. Regular updates to the benchmark will help NLG research become more multilingual and evolve the challenge alongside models. This paper serves as the description of the data for which we are organizing a shared task at our ACL 2021 Workshop and to which we invite the entire NLG community to participate

arXiv.org e-Print Archive

Heriot Watt Pure

INRIA a CCSD electronic archive server

Tilburg University Repository

Beyond the imitation game: Quantifying and extrapolating the capabilities of language models

Author: Abid Abubakar
Agarwal Akshat
Agha Omar
Alabi Jesujoba
Ali Tariq
Alipoormolabashi Pegah
Aminnaseri Moin
Anand Sajant
Andreassen Anders
Arakawa Riku
Argueta Cedrick
Arnaud Melody
Asaadi Shima
Ashcraft Courtney
Askell Amanda
Bahri Yasaman
Bai Yuntao
Baitemirova Medina Orduna
Balis John U.
Banjade Rabin
Bansal Mohit
Baral Chitta
Barnes Elizabeth
Barnes Richard
Baturan Marco
Belinkov Yonatan
Berant Jonathan
Betz Gregor
Bevilacqua Michele
Biderman Stella
Bischoff Sebastian
Bogar Hayden
Bojanowski Bartłomiej
Bosma Maarten
Bosscher Jelle
Boudeman Joseph
Bowman Samuel R.
Brown Adam R.
Burden John
Buzan Dilyar
Cain Mike
Callison-Burch Chris
Cameron Nicholas
Casares Pablo Antonio Moreno
Casey Sean
Chang Ernie
Chang Peter
Chang Trenton
Chen Angelica
Chen Danqi
Chen Derek
Chen Qinlang
Chen Yifu
Chi Ethan A.
Chi Nathan
Chi Ryan
Chiafullo Kristen
Choi Yejin
Chollet Francois
Chu Eric
Chua Joyce
Cohen Michael
Colón Luis Oliveros
Constant Noah
Contreras-Ochando Lidia
Cubuk Ekin Dogus
Dai Andrew
Datta Debajyoti
Debnath
Deckers Niklas
Dehaene Stanislas
Delgado Ramón Risco
Demberg Vera
Desbordes Théo
Dhole Kaustubh D.
Diao Cameron
Dillavou Sam
Divic Stefan
Dohan David
Doiron Nick
Donoway Elizabeth
Doshi Parth
Dour Cameron
Drakard David
Dsouza Amanda
Dugan Liam
Dyer Ethan
Eckersley Peter
Efrat Avia
Ekmekci Berk
Elbaghdadi Omar
Emelin Denis
Engel Jesse
Erdem Aykut
Erdem Erkut
Ermon Stefano
Evans Owain
Farooqi Maheen
Faruqui Manaal
Fedus William
Fiedel Noah
Fisac Jaime Fernández
Fisch Adam
Frank Robert
Freeman Daniel
Frohberg Jörg
Fung Pascale
Gabriel Raefer
Galijasevic Hana
Ganguli Deep
Gao Leo
Garbacea Cristina
Garg Rhythm
Garrette Dan
Garriga-Alonso Adrià
Gehrmann Sebastian
Geissinger Jack
Gerstenberg Tobias
Geva Mor
Ghazarian Sarik
Gheini Mozhdeh
Gholamidavoodi Arash
Ghosh Sayan
Gilboa Dar
Gimpel Kevin
Giulianelli Mario
González Daniel Moseguí
Gopalakrishnan Karthik
Gottardi Anna
Gruetter Samuel
Gu Michael
Gu Shixiang Shane
Gupta Aditya
Gupta Animesh
Gur-Ari Guy
Habacker Rahel
Hagen Matthias
Hagerman Eleanor
Hajishirzi Hannaneh
Hamdan Shadi
Han Sanghyun
Hao Yiding
Happé Francesca
Hashimoto Tatsu
Hatwar Sriharsha
He Luheng
Hedayatnia Behnam
Hendrycks Dan
Hernandez Danny
Hernandez-Orallo Jose
Herrick Austin
Hilton Jacob
Hoeve Maartje ter
Hou Yu
Hou Yufang
Howald Blake
Htut Phu Mon
Hupkes Dieuwke
Hussain Aman
Hwang Pinyu
Ignatyeva Katerina
Inden Benjamin
Ippolito Daphne
Ivanitskiy Michael
Iyer Anantharaman S.
Iyer Niveditha S.
Jacobs Rowan
Jaimovitch-López Gonzalo
Jerzak Ethan
Jiang Angela
Jones Joseph
Jumelet Jaap
Jurgens David
Kale Mihir
Kanclerz Kamil
Kaplan Jared
Karakaş Ayla
Kernion Jackson
Keskar Nitish Shirish
Khashabi Daniel
Khot Tushar
Kilman Dan
Kim Ethan
Kim Hannah
Kim Jeremy
Kiritchenko Svetlana
Kirubarajan Arun
Kleyko Denis
Kluska Agnieszka
Kocoń Jan
Kocurek Alexander W.
Koppel James
Kornev Timofei
Krakover Neta Gur-Ari
Krauth Karl
Kruszewski Germán
Kwatra Sanjeev
La Andrew
Lakretz Yair
Lam Emma
Lam Lucas
Lampinen Andrew
Leavitt Matthew L.
LeBras Ronan
Lee Dong-Ho
Lee Jaehoon
Lee Nayeon
Lee Ryan
Lee Soo-Hwan
Levy Daniel
Levy Omer
Lewis Martha
Lewkowycz Aitor
Li Tao
Liang Paul Pu
Liang Percy
Liao Peiyuan
Lin Bill Yuchen
Lin Stephanie
Linzen Tal
Liu Rosanne
Livescu Karen
Loe Bao Sheng
Lyu Qing
Madotto Andrea
Makini Sneha Priscilla
Manning Christopher D.
Manyasi Eunice Engefu
Marelli Marco
Mariani Giorgio
Markert Katja
Marsh Jennifer
Martínez-Plumed Fernando
Maru Marco
Mathewson Kory
Mazeika Mantas
McDonell Kyle
McElrath Melvin
Mehta Harsh
Mei Qiaozhu
Melo Gerard de
Melzi Simone
Menezes Arul
Meng Chenlin
Metz Luke
Miller John
Millière Raphaël
Misherghi Summer
Mishra Gaurav
Mishra Swaroop
Misra Diganta
Misra Vedant
Miłkowski Piotr
Mohammad Saif M.
Mollo Dimitri Coelho
Morency Louis-Philippe
Moschella Luca
Muennighoff Niklas
Mukund Varma T
Mullokandov Asher
Nangia Nikita
Neeraj Trishala
Neyshabur Behnam
Ng Ian
Nie Allen
Nkinyili Tiberius
Noble Isaac
Noble Lucy
Norelli Antonio
Novak Roman
Novikova Jekaterina
Nyamai Victoria
Oli Priti
Omondi Kevin
Pachchigar Shubh
Padmakumar Vishakh
Parascandolo Giambattista
Parrish Alicia
Patil Piyush
Pavlick Ellie
Peng Nanyun
Perszyk Danielle
Pezeshkpour Pouya
Phan Thomas
Phang Jason
Piantadosi Steven T.
Potthast Martin
Potts Christopher
Power Alethea
Prabhu Vinay Uday
Prasad Stephen
Qin Lianhui
Quintana Maria Jose Ramírez
Radom Jarema
Raffel Colin
Rahane Ameet
Ramasesh Vinay
Ramirez Cindy
Ramírez César Ferri
Rao Abhishek
Rashkin Hannah
Rastogi Abhinav
Rathkopf Charles
Raunak Vikas
Ray Alex
Raymaekers Robbe
Reddy Siva
Ren Xiang
Reynolds Laria
Richardson Kyle
Rivera Clara E.
Roberts B. Ryan
Roberts Nicholas
Rodola Emanuele
Rong Frieda
Roth Dan
Rothschild Theodore
Rous Sarah A.
Rozen Jos
Rudolph Rachel Etta
Rule Joshua S.
Sabharwal Ashish
Sadeghi Sepideh
Safaya Ali
Salakhutdinov Ruslan
Santilli Andrea
Santoro Adam
Sap Maarten
Saunders William
Saurous Rif A.
Schick Timo
Schmidt Ludwig
Schoenholz Samuel S.
Schubert Mátyás
Schuster Sebastian
Schuster Tal
Schütze Hinrich
Segal Elad
Seid Zachary
Shaham Uri
Shakeri Siamak
Shen Xudong
Shevlin Henry
Shi Sherry
Shieber Stuart M.
Shkaruta Ksenia
Shleifer Sam
Shoeb Abu Awal Md
Shridhar Kumar
Shultz Tyler
Shutova Ekaterina
Shyamolima
Siar Fatemeh
Sikand Rohan
Sileo Damien
Simon James B.
Singh Chandan
Singh Shikhar
Siro Clemencia
Sitelew Roman
Slone Ambrose
Sohl-Dickstein Jascha
Song Jiaming
Song Yangqiu
Srikumar Vivek
Srivastava Aarohi
Srivastava Shashank
Starritt Michael
Stein Benno
Stinson Catherine
Stovall Ryan
Strube Michael
Stuhlmüller Andreas
Suzgun Mirac
Swędrowski Michał
Taal Jeroen
Tabassum Arfa
Tam Derek
Tang Eric
Tang Jillian
Tazarv Ali
Teehan Ryan
Telleen-Lawton Timothy
Tenenbaum Joshua B.
Thompson Jana
Thormeyer Simon
Tiwari Mo
Tolkiehn Marie
Tong Xiaoyu
Torene Spencer
Toshniwal Shubham
Tunduny Titus
Upadhyay Shyam
Venkatesh Anu
Vicol Paul
Voigt Christian
Vossen Wout
Vuong Anh
Waites Chris
Wang Gloria
Wang Tianle
Wang Zijian
Wang Zijie J.
Wang Zirui
Warstadt Alex
Waweru Joan
Wei Jason
Wen Nuan
Winata Genta Indra
Wiseman Sam
Wong Hugh Mee
Wu Chiyu
Wu Te-Lin
Wu Xinyi
Wu Ziyi
Xia Fanyue
Xiang Alice
Xu Jiacheng
Xu Mimee
Yaghoobzadeh Yadollah
Yakura Hiromu
Yang Diyi
Yang Rylan
Yang Yichi
Yasunaga Michihiro
Yee Michael A.
Yosinski Jason
Yu Tao
Yuret Deniz
Zhang Hongming
Zhang Li
Zhang Oliver
Zhang Rui
Zhang William
Zhao Xinran
Zhao Zhuoye
Zheltonozhskii Evgenii
Zheng James
Zhou Sharon
Zoph Barret
Zou Andy
Zou James
Özyurt Batuhan
Şenel Lütfi Kerem
Publication venue
Publication date: 09/06/2022
Field of study

Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-future capabilities and limitations of language models. To address this challenge, we introduce the Beyond the Imitation Game benchmark (BIG-bench). BIG-bench currently consists of 204 tasks, contributed by 442 authors across 132 institutions. Task topics are diverse, drawing problems from linguistics, childhood development, math, common-sense reasoning, biology, physics, social bias, software development, and beyond. BIG-bench focuses on tasks that are believed to be beyond the capabilities of current language models. We evaluate the behavior of OpenAI's GPT models, Google-internal dense transformer architectures, and Switch-style sparse transformers on BIG-bench, across model sizes spanning millions to hundreds of billions of parameters. In addition, a team of human expert raters performed all tasks in order to provide a strong baseline. Findings include: model performance and calibration both improve with scale, but are poor in absolute terms (and when compared with rater performance); performance is remarkably similar across model classes, though with benefits from sparsity; tasks that improve gradually and predictably commonly involve a large knowledge or memorization component, whereas tasks that exhibit "breakthrough" behavior at a critical scale often involve multiple steps or components, or brittle metrics; social bias typically increases with scale in settings with ambiguous context, but this can be improved with prompting